本体匹配(OM)在许多领域(例如生物信息学和语义网络)中起着重要作用,其研究变得越来越流行,尤其是在机器学习(ML)技术的应用中。尽管本体论对准评估计划(OAEI)代表了对OM系统进行系统评估的令人印象深刻的努力,但它仍然受到了几个限制,包括对集合映射的评估,次优参考映射以及对基于ML的系统评估的支持有限。为了应对这些限制,我们介绍了五项新的生物医学OM任务,这些任务涉及从Mondo和UMLS提取的本体。每个任务既包括等价和归因匹配;通过人类的策展,本体论修剪等确保参考映射的质量。并提出了一个全面的评估框架,以从基于ML的基于ML和非ML的OM系统从各个角度衡量OM性能。我们报告了不同类型的OM系统的评估结果,以证明这些资源的使用情况,所有这些资源都是在OAEI 2022年新的BioML轨道的一部分中公开使用的。
translated by 谷歌翻译
自动化本体策划是知识工程中的至关重要的任务。通过机器学习技术(例如语义嵌入)的预测是一个有希望的方向,但相关研究仍然是初步的。在本文中,我们提出了一个名为Bertsubs的类集合预测方法,该方法使用预训练的语言模型BERT来计算类标签和自定义输入模板的上下文嵌入,以结合周围类的上下文。对两个大型现实世界的评估表明,其性能比最先进的表现更好。
translated by 谷歌翻译
机器学习方法尤其是深度神经网络取得了巨大的成功,但其中许多往往依赖于一些标记的样品进行训练。在真实世界的应用中,我们经常需要通过例如具有新兴预测目标和昂贵的样本注释的动态上下文来解决样本短缺。因此,低资源学习,旨在学习具有足够资源(特别是培训样本)的强大预测模型,现在正在被广泛调查。在所有低资源学习研究中,许多人更喜欢以知识图(kg)的形式利用一些辅助信息,这对于知识表示变得越来越受欢迎,以减少对标记样本的依赖。在这项调查中,我们非常全面地审查了90美元的报纸关于两个主要的低资源学习设置 - 零射击学习(ZSL)的预测,从未出现过训练,而且很少拍摄的学习(FSL)预测的新类仅具有可用的少量标记样本。我们首先介绍了ZSL和FSL研究中使用的KGS以及现有的和潜在的KG施工解决方案,然后系统地分类和总结了KG感知ZSL和FSL方法,将它们划分为不同的范例,例如基于映射的映射,数据增强,基于传播和基于优化的。我们接下来呈现了不同的应用程序,包括计算机视觉和自然语言处理中的kg增强预测任务,还包括kg完成的任务,以及每个任务的一些典型评估资源。我们最终讨论了一些关于新学习和推理范式的方面的一些挑战和未来方向,以及高质量的KGs的建设。
translated by 谷歌翻译
本体对齐(A.K.A本体匹配(OM))在知识集成中起着关键作用。由于机器学习在许多域中的成功,它已应用于OM。然而,通常采用ad-hoc特征工程或非上下文词嵌入的现有方法尚未表现出基于规则的系统,特别是在无监督的环境中。在本文中,我们提出了一个名为BertMap的新型OM系统,它可以支持无监督和半监督的设置。它首先使用基于微调从本体中提取的文本语义语料上的上下文嵌入模型伯特来预测使用分类器的映射,然后通过利用本体结构和逻辑来通过扩展和修复来改进映射。我们在生物医学本体上的三个对齐任务的评估表明BertMap通常可以比领先的OM系统LogMap和AML更好。
translated by 谷歌翻译
Modelling and forecasting real-life human behaviour using online social media is an active endeavour of interest in politics, government, academia, and industry. Since its creation in 2006, Twitter has been proposed as a potential laboratory that could be used to gauge and predict social behaviour. During the last decade, the user base of Twitter has been growing and becoming more representative of the general population. Here we analyse this user base in the context of the 2021 Mexican Legislative Election. To do so, we use a dataset of 15 million election-related tweets in the six months preceding election day. We explore different election models that assign political preference to either the ruling parties or the opposition. We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods. These results demonstrate that analysis of public online data can outperform conventional polling methods, and that political analysis and general forecasting would likely benefit from incorporating such data in the immediate future. Moreover, the same Twitter dataset with geographical attributes is positively correlated with results from official census data on population and internet usage in Mexico. These findings suggest that we have reached a period in time when online activity, appropriately curated, can provide an accurate representation of offline behaviour.
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译
Remote sensing of the Earth's surface water is critical in a wide range of environmental studies, from evaluating the societal impacts of seasonal droughts and floods to the large-scale implications of climate change. Consequently, a large literature exists on the classification of water from satellite imagery. Yet, previous methods have been limited by 1) the spatial resolution of public satellite imagery, 2) classification schemes that operate at the pixel level, and 3) the need for multiple spectral bands. We advance the state-of-the-art by 1) using commercial imagery with panchromatic and multispectral resolutions of 30 cm and 1.2 m, respectively, 2) developing multiple fully convolutional neural networks (FCN) that can learn the morphological features of water bodies in addition to their spectral properties, and 3) FCN that can classify water even from panchromatic imagery. This study focuses on rivers in the Arctic, using images from the Quickbird, WorldView, and GeoEye satellites. Because no training data are available at such high resolutions, we construct those manually. First, we use the RGB, and NIR bands of the 8-band multispectral sensors. Those trained models all achieve excellent precision and recall over 90% on validation data, aided by on-the-fly preprocessing of the training data specific to satellite imagery. In a novel approach, we then use results from the multispectral model to generate training data for FCN that only require panchromatic imagery, of which considerably more is available. Despite the smaller feature space, these models still achieve a precision and recall of over 85%. We provide our open-source codes and trained model parameters to the remote sensing community, which paves the way to a wide range of environmental hydrology applications at vastly superior accuracies and 2 orders of magnitude higher spatial resolution than previously possible.
translated by 谷歌翻译
Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required.
translated by 谷歌翻译
Natural language interaction is a promising direction for democratizing 3D shape design. However, existing methods for text-driven 3D shape editing face challenges in producing decoupled, local edits to 3D shapes. We address this problem by learning disentangled latent representations that ground language in 3D geometry. To this end, we propose a complementary tool set including a novel network architecture, a disentanglement loss, and a new editing procedure. Additionally, to measure edit locality, we define a new metric that we call part-wise edit precision. We show that our method outperforms existing SOTA methods by 20% in terms of edit locality, and up to 6.6% in terms of language reference resolution accuracy. Our work suggests that by solely disentangling language representations, downstream 3D shape editing can become more local to relevant parts, even if the model was never given explicit part-based supervision.
translated by 谷歌翻译
Neural networks have revolutionized the area of artificial intelligence and introduced transformative applications to almost every scientific field and industry. However, this success comes at a great price; the energy requirements for training advanced models are unsustainable. One promising way to address this pressing issue is by developing low-energy neuromorphic hardware that directly supports the algorithm's requirements. The intrinsic non-volatility, non-linearity, and memory of spintronic devices make them appealing candidates for neuromorphic devices. Here we focus on the reservoir computing paradigm, a recurrent network with a simple training algorithm suitable for computation with spintronic devices since they can provide the properties of non-linearity and memory. We review technologies and methods for developing neuromorphic spintronic devices and conclude with critical open issues to address before such devices become widely used.
translated by 谷歌翻译